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ABSTRACT 

In this paper, we report the development of an error 
control scheme for the adaptive subbands excited trans- 
form (ASET) coding algorithm [1,2). It is a relatively simple 
source-channel coding scheme which takes into account 
the sensitivity of the compression parameters to transmis- 
sion errors. The added computational complexity, encod- 
ing-decoding delay, and rate overhead are all relatively 
small. The compression algorithm, with error control and 
without, has been implemented and tested at a total bit rate 
of 16 kb/s. The channel used is the memoryless binary 
symmetric channel with error probability up to 0.01. The 
results show that under error-free conditions speech quality 
is unchanged. However, under error conditions the quality 
of speech produced by the protected algorithm is consis- 
tently better. 


I. INTRODUCTION 

The ASET compression method can be viewed as fre- 
quency-domain subband coding. A block diagram of the 
coding algorithm is shown in Figure 1. Input speech is 
buffered into blocks of N samples and normalized in ampli- 
tude. Each block is then Fourier transformed and the spec- 
trum is partitioned into K contiguous subbands. Following 
that, the spectral envelope is estimated and applied to filter 
the spectrum and to control the bit allocation. In the re- 
ceiver, some of the transmitted residual subbands are se- 
quentially translated to regenerate subbands which were 
not transmitted. The signal spectrum is then reconstructed 
and the time function recovered by inverse transformation 
and inverse amplitude normalization. Finally, a post coding 
filter is used to enhance speech quality. ' 

In the absence of channel errors the ASET algorithm 
produces toll quality speech at rates of 12 to 16 kb/s. In 
the presence of channel errors {above .005 error probabil- 
ity), however, the performance of the algorithm degrades 
noticeably. Therefore, some error protection measures 
need to be incorporated to preserve acceptable speech 
quality under severe noise conditions. Since error protec- 
tion generally entails the transmission of redundant bits, it 
must be applied carefully. Tradeoffs between the rate and 
performance associated with the source coder and the de- 
gree of error protection need to be made. For a given 
transmission rate, the attempt is to maximize speech qual- 
ity under noisy channel conditions with minimal degrada- 
tion in error-free performance of the source coder. 


In this work, we describe a relatively simple solution 
based on a combined source-channel coding approach. 
The objective is a solution which requires small rate over- 
head yet produces appreciable improvement in speech 
quality. By providing selective error protection, the pro- 
posed solution improves the reconstructed speech quality 
without sacrificing much transmission bandwidth. To pro- 
vide the necessary protection, the design incorporates 
knowledge on the sensitivity of the reconstructed speech 
quality to errors in various compression parameters. Ac- 
cordingly, compression parameters which contribute more 
to the fidelity of the reconstructed speech receive more 
protection than others. 

To analyze the error sensitivity, we evaluated the im- 
pact of errors in each set of parameters on the perceived 
speech quality. This analysis showed that while errors in 
the normalization scale factor and the spectral envelope pa- 
rameters effect the perceived speech quality more ad- 
versely, errors in the residual coefficients have limited ef- 
fect on quality. Moreover, it was observed that some type 
of errors have more severe perceptual effect than others. 
Based on these results, we found it sufficient to use error 
detection to protect the normalization scale factor and the 
spectral envelope parameters, and to leave the residual 
coefficients unprotected, in addition, we found that optimiz- 
ing the index assignment in the 2D quantizers improves 
performance significantly without sacrificing any rate. This 
optimization is done using a simulated annealing proce- 
dure. 

The proposed error control scheme was incorporated 
into the compression algorithm and will be tested using 
dedicated real-time hardware [3]. The hardware used is 
based on a single board with a single Motorola DSP56000 
digital signal processing (DSP) chip. In the current configu- 
ration, the hardware can process a single full-duplex chan- 
nel in less than 25 percent real time. That leaves a signifi- 
cant amount of processing power for other use, such as for 
echo cancellation. Alternatively, the proposed hardware 
can multiplex and process up to four full-duplex voice chan- 
nels. 


II. SOURCE CODING ALGORITHM 

A block diagram of the coding algorithm is shown in 
Figure 1. Digitized input speech is buffered into blocks of 
N samples with an overlap between successive blocks. 
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Each block is then trapezoidally windowed and normalized 
in amplitude. The normalization scale factor, determined 
by the peak amplitude in each block, is quantized, and its 
index t is sent to the receiver. The size of the correspond- 
ing quantizer codebook is 10. After Fourier transformation, 
the corresponding spectrum is divided into K contiguous 
subbands of n frequency coefficients each. The spectral 
envelope, assumed constant in each subband, is defined by 
the peak magnitude, p, in the subband. The set 

(Pi,p 2 *Pjr) defines a piecewise estimate of the spectral 

envelope and a frequency response for the analysis-synthe- 
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Figure 1. ASET basic coding system. 


sis filter. After quantization of {p}, the spectral envelope 

estimate is represented by a set of indices (yj,Y 2 ,y*). 

The size of the corresponding quantizer codebook is 16. 
The spectral envelope parameters control the selection of 
subbands to be transmitted as well as their bit allocation 
and define the analysis-synthesis filter. They are also sent 
to the receiver to be used in reconstruction of the spec- 
trum. The inverse filter frequency response is defined by a 

set of scale factors (<pi,<P 2 « where <p,- = 1/p,- and 

= Q ~ 1 [Q[M|J]- An illustration, showing the spectrum of a 
frame of actual speech and the corresponding quantized 
spectral envelope estimate, is shown in Figure 2. 

Inverse filtering is performed by scaling the magnitude 
of the coefficients in each subband according to the scale 
factor <p associated with that subband. Figure 3. illustrates 
the spectrum after inverse filtering for the signal shown in 
Figure 2. Following scaling, a set of L subbands with an 
aggregate bandwidth equal to approximately half the total 
bandwidth is selected for transmission, and the corre- 
sponding residual coefficients are quantized using nonuni- 
form bit allocation. To implement the nonuniform bit alloca- 
tion in a manageable way, a set of three pre-designed 
two-dimensional (2Q) vector quantizers of different rates is 
used. The codebodk sizes for these quantizers are: 
M 1 = 128, M 6 = 64, and M 4 = 16. 

At the receiver, some or all of the transmitted residual 
subbands are replicated to regenerate missing subbands. 
The received set of y indices controls the inverse 2D quan- 
tization and spectral replication, and defines the synthesis 
filter. Following replication, the spectrum is reconstructed 

using the synthesis filter defined by (pi ,p 2 The 

time-function representation is then recovered by inverse 
transformation, inverse amplitude normalization, and over- 
lapping and adding adjacent frames. Finally, a novel post- 
coding filter is used to reduce framing noise and enhance 
speech quality. 



Figure 2. Signal spectrum (solid) and quantized 
spectral envelope estimate (broken). 



Figure 3. Residual spectrum. 


III. CHANNEL ERROR PROTECTION 


Error Sensitivity 

The first step in designing the channel error protection 
scheme was to evaluate the error sensitivity of the com- 
pression parameters. We exposed to errors one set of 
transmitted parameters at a time and then evaluated the re- 
constructed speech quality. Three sets of parameters were 
evaluated: t, the block-amplitude normalization factor; 
{y,}, the specjral envelope parameters; and the resi- 
dual coefficients. The transmission channel used is the 
memoryless binary symmetric channel with error probabil- 
ity up to .01. The speech quality is assessed using informal 
subjective evaluation tests and mean-squared error figures. 
Based on this evaluation we conclude the following: 

Reconstructed speech quality is quite insensitive to er- 
rors in the residual coefficients. Sensitivity can be fur- 
ther reduced by optimizing the index assignments in the 
corresponding quantizers without sacrificing transmis- 
sion bits. No additional protection for the residual coef- 
ficients is needed. 

Errors in t and {y,} effect output speech quality more 
adversely. Errors in t are noticeable as "pops" caused 
by frame-long jumps in the output signal level. Simi- 
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larly, errors in {y,} are noticeable as tonal "beeps" 
caused by jumps in energy levels in some frequency 
bands. Two types of errors are most noticeable: er- 
rors which affect medium to high energy blocks or sub- 
bands, and errors which increase the energy level of re- 
constructed blocks or subbands. 

Considering the relative robustness of t and {y,} to 
other types of error, a simple error detection scheme 
with an error substitution rule which limits the occur- 
rences of the worst-type errors may be sufficient. 

The channel bit error rate is sufficiently small so that 
we can consider only a single bit error per binary code- 
word. 


Index Assignment / Simulated Annealing 

The index assignment problem to be solved is the fol- 
lowing. Given an input vector y , the quantizer searches a 
codebook C for a codeword (a reproduction vector) that 
minimizes a squared-error distance. The transmitter then 
forwards the index (binary codeword) of that codeword to 
the receiver. The length of the binary codeword is n = 
log 2 M, where M is the size of the codebook. As the deco- 
der stores a replica of the codebook, the received index, 
assuming noiseless channel, determines uniquely the cor- 
rect reproduction vector. However, when channel errors 
occur, the received index will differ from the transmitted in- 
dex, and the reproduction vector will differ from the code- 
word selected at the transmitter. The average distortion 
caused by these errors is given by: 

M - 1 M - 1 

D(t) = £ £ P(c)p{>(c j )l,{c))\\c r c ] \\ l (1) 

/=o ;= o 

where t = (/(c,),/(c 2 ), ,/(c*f)) is the indexing scheme, 

t(Cj) is the index associated with the codeword c it p(c :,) is 
the probability of the codeword c,-, is the tran- 

sitional probability due to channel errors, and ||c f — c^H 2 is 
the squared-error distance between codewords c { and cj. 
This distortion is in addition to the distortion caused by the 
quantization. The objective in the optimization problem is 
to find an index assignment scheme t which minimizes the 
average distortion D{t). 

Simulated annealing is a numerical imitation of the 
physical annealing process. Physical annealing is a pro- 
cess used in growing crystals or in softening steel. In this 
process, the material is heated to melting temperature and 
then cooled slowly in order to minimize the internal poten- 
tial energy. Simulated annealing can be used to obtain ap- 
proximate solutions to combinatorial optimization problems 
such as ours. By defining the index assignment t as the 
state of the simulated system and the average distortion 
function D(t) as its energy, and by slowly reducing a con- 
trol parameter representing the system temperature, we 
can seek (via random perturbations) a minimum distortion 
state. 

The simulation algorithm we used is similar to the one 
described in [4] and its specifics will not be discussed 
here. The algorithm converges relatively fast and produces 
index assignments that perform significantly better than 
random assignments. To examine the consistency of the 
results we ran the algorithm several times for each of the 
three 2D quantizers. These results show that although dif- 
ferent runs converged to different assignments the distor- 


tion figures achieved are very close. The codebook sizes 
for the quantizers considered are 128, 64, and 16. The cor- 
responding rates, in bit per dimension (b/d), are 3.5, 3, and 
2 . 

To evaluate the results, we compare the optimized as- 
signment schemes against randomly selected assignment 
schemes. The performance of each optimal scheme is 
compared against the average performance and the worst 
performance of twenty randomly selected schemes. The 
corresponding average distortion figures normalized by the 
channel error probability are listed in Table 1. As these re- 
sults show, the optimized schemes outperform the worst 
and average random schemes correspondingly by 5.11 dB 
and 4.46 dB for the 3.5-b/d quantizer, by 4.31 dB and 4.10 
dB for the 3-b/d quantizer, and by 3.25 dB and 2.64 dB for 


'^\Rate 

Scheme'''^ 

3.5 b/d 

3.0 b/d 

2.0 b/d 

Worst 

7.54 

6.56 

5.23 

Average 

6.89 

6.35 

4.62 

Optimal 

2.43 

2.25 

1.98 


Table 1. Normalized squared-error in dB 


the 2-b/d quantizer. Consistent with these results, the 
ASET algorithm performs significantly better with the opti- 
mized indexing schemes than with the random indexing 
schemes. The gains in signal to noise ratio (SNR) for the 
optimized case against the worst random case at different 
channel bit error rates (ber) are shown in Table 2. These 
results show gains in global and segmental SNR of 1.75 dB 
and 0.42 dB at 0.001 ber, 2.88 dB and 1.53 dB at 0.005 ber, 
and 3.33 dB and 2.31 dB at 0.01 ber. That is, the improve- 
ment in speech quality is consistent, and the higher the ber 


'' V \BER 

0.001 

0.005 

0.01 

Global 

1.75 

2.88 

3.33 

Segmental 

0.42 

1.53 

2.31 


Table 2. Performance gains in dB 


the more appreciable it is. 


Normalization Scale Factor 

The block-normalization scale factor is quantized using 
a codebook size of 10. Using 5-bit binary codewords to in- 
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dex the codebook allows for detection of all single error 
patterns and some two error patterns. To decode the re- 
ceived indices we use a lookup table of size 2**5 — 32. 
This table incorporates a decoding rule designed to mini- 
mize perceptual effects caused by single error patterns. 

The decoding table is pre-designed as follows. Ten bi- 
nary codewords, representing the quantizer codewords, are 
mapped to their corresponding quantizer codewords. All 
other binary codewords (erroneous codewords) are as- 
sumed to be the result of single error patterns. Each erro- 
neous codeword has a set of binary codewords (five) to 
which its Hamming distance is one. Some or all of the 
codewords in this set are assigned to quantizer codewords, 
and the erroneous index is mapped to one of these quan- 
tizer codewords. When the erroneous index is received, 
the decoding table outputs a codeword (one of the above) 
which, after inverse amplitude normalization, produces the 
lowest signal level. 

In addition, a careful selection of the binary codebook 
can further reduce the maximum distortion caused by er- 
rors. The codebook can be composed of two disjoint sets, 
each with five binary codewords, having the following prop- 
erties (see Table 3). Within a set, the Hamming distance is 
exactly two; between sets, the minimum Hamming distance 
is three. We assign one set to index the five largest ampli- 
tudes in the quantizer and the other to Index the five small- 
est amplitudes. As a result, the maximum distortion due to 
single error patterns is limited by the largest distance be- 


Set A 

Set B 

11111 

00000 

00111 

11000 

01011 

10100 

OHIO 

10001 

01101 

10010 


Table 3. Binary Codebook for t 


tween quantizer codewords (amplitudes) within a set. 


Spectral Envelope Parameters 

The spectral parameters are quantized using a code- 
book size of 16, or 4 bit per binary codeword. The error 
protection scheme is basically a single parity check code. 
That is, every 4-bit binary codeword is appended with one 
extra digit such that the total number of ones in the 5-bit 
sequence is even. This code, which allows for detection of 
all single error patterns, is used to index the quantizer 
codewords. Again, to decode the received indices we use 
a lookup table of size 2**5 = 32, and a decoding rule de- 
signed to minimize perceptual effects caused by single er- 
ror patterns. 

The decoding table is pre-designed as follows. Sixteen 


indices, representing the quantizer codewords, are 
mapped, when received, to their corresponding codewords. 
All other binary codewords (sixteen erroneous codewords) 
are assumed to be the result of single error patterns. Each 
erroneous codeword has a set of binary codewords (five) to 
which its Hamming distance is one. All of the codewords 
in this set are assigned to quantizer codewords, and the er- 
roneous index is mapped to one of these quantizer code- 
words. When the erroneous index is received, the decod- 
ing table outputs a codeword (one of the above) which, 
after spectrum reconstruction, produces the lowest signal 
level in the corresponding subband. 


IV. CONCLUSIONS 

We introduced a combined source-channel coding 
scheme to protect the performance of the ASET algorithm 
in the event of transmission errors. It is a relatively simple 
design which takes into account the sensitivity of the vari- 
ous compression parameters to transmission errors. The 
solution improves markedly the quality of speech produced 
under noisy conditions without sacrificing much transmis- 
sion bandwidth. In addition, the added computational com- 
plexity and encoding-decoding delay are relatively small. 

The compression algorithm, with channel error control 
and without, has been implemented and tested at a total bit 
rate of 16 kb/s. The simulated channel is the memoryless 
binary symmetric channel with error probability up to 0.01. 
The results show that under error-free conditions the qual- 
ity of speech produced by both algorithms is identical. 
However, under error conditions the quality of speech pro- 
duced by the error protected algorithm is consistently bet- 
ter. The compression algorithm with channel error control 
is now being implemented on our dedicated hardware for 
real-time evaluations. 
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